home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-07-20 | 47.1 KB | 1,024 lines |
-
-
-
- Uniform Resource Locators Tim Berners-Lee
- INTERNET DRAFT CERN
- IETF URL Working Group 14 July 1993
-
-
-
- Uniform Resource Locators
-
-
- Status of this memo
-
- This document is an Internet Draft. Internet Drafts are
- working documents of the Internet Engineering Task Force
- (IETF), its Areas, and its Working Groups. Note that other
- groups may also distribute working documents as Internet
- Drafts.
- Internet Drafts are working documents valid for a maximum of
- six months. Internet Drafts may be updated, replaced, or
- obsoleted by other documents at any time. It is not
- appropriate to use Internet Drafts as reference material or to
- cite them other than as a "working draft" or "work in
- progress".
- Distribution of this document is unlimited. Please send
- comments to the author as timbl@info.cern.ch. or to the
- discussion list ietf-url@merit.edu.
-
- Abstract
-
- Many protocols and systems for document search and retrieval
- are currently in use, and many more protocols or refinements
- of existing protocols are to be expected in a field whose
- expansion is explosive.
- These systems are aiming to achieve global search and
- readership of documents across differing computing platforms,
- and despite a plethora of protocols and data formats. As
- protocols evolve, gateways can allow global access to remain
- possible. As data formats evolve, format conversion programs
- can preserve global access. There is one area, however, in
- which it is impractical to make conversions, and that is in
- the names and addresses used to identify objects. This is
- because names and addresses of objects are passed on in so
- many ways, from the backs of envelopes to hypertext objects,
- and may have a long life.
- This paper discusses the requirements on a universal syntax
- which can be used to refer to objects available using existing
- protocols, and may be extended with technology. It makes a
- recommendation for a generic syntax, and for specific forms
- for "Uniform Resource Locators" (URLs)of objects accessible
- using existing Internet protocols.
-
- Uniform Resource Locators Berners-Lee
-
-
-
- Terms
-
- The objects on the network which are to be named and
- addressed include typically objects which can be retrieved,
- and objects which can be searched. There is a great variety
- of other objects which may support other operations. We imply
- nothing about the contents of objects in this document.
- Whereas human-readable documents are currently the center of
- interest of the field, we envisage all aspects discussed in
- this paper applying to generalised objects when systems to
- handle them become available. The "object" is the unit of
- reference and need not correspond to any unit of storage. We
- refer to objects which can be searched as "indexes". We
- emphasise that this is the abstract view of the client, and
- these objects need not correspond to physical files on
- computers. We refer to the person who does the retrieval or
- searching as the user.
- Within this document, we use the terms "name" very generally
- for a string of characters describing an object, whatever its
- combination of properties mentioned below. (The term usually
- has a narrower meaning but we needed some term for the
- universal set). The term "address" is reserved for an string
- which specifies a more or less physical location. The term
- "locator" refers to a URL as here defined.
-
- Requirements
-
- This section discusses requirements for URLs, as an
- introduction of and background for the Recommendations
- section.
-
- Uses of names and addresses
-
- A name allows a user, with the help of a "client" program,
- to retrieve or operate on objects via a "server" program. A
- name may be passed for example:
- - In communication of any form between two people, to
- refer to a document, or part of a document;
- - As part of the description of a link associated with
- a hypertext document;
- - As part of the result of searching an index.
- Some typical requirements on a name which are met to a
- varying degree by various schemes are for example that the
- name is
- Persistent A given name will remain valid as long as it
- is needed;
- Extensible A given naming syntax will remain valid
- through the introduction of new protocols and
- directory technologies;
- Resolvable A name will contain enough information to
- allow the document or index to which it
-
-
- Internet Draft 2 March 1993Uniform Resource Locators Berners-Lee
-
-
- refers to be accessed, perhaps via resolution
- into an intermediate, more physical, name.
- Unique Each object can only have one such name. The
- fact that two such names are different
- implies that the objects to which they refer
- are different (in some way).
- Unambiguous The fact that two names are identical implies
- that the objects named are the same (in some
- way).
- The syntax discussed is the syntax of one name, be it a
- lasting name or a physical address. When a directory server
- or hypertext link contains a set of alternative names, then
- that is beyond the scope of this syntax. Similarly, a syntax
- for describing a compound object is outside the scope of this
- syntax. The specific locator name spaces (defined under the
- umbrella of the general syntax) each meet the requirements
- above to a greater or lesser extent.
-
- Current practice
-
- Current protocols use many different standards for names.
- For some protocols, such as ISO-10163 Search and Retrieve
- protocol[16], the names returned in a search are only valid
- during the session. For others, such as FTP[9], they are
- lasting names which may be used for object retrieval at a
- later time. Typically, however, they are not long-lasting
- names which are independent of the location of the object.
- Such names may be provided using directory servers such as
- x.500. They will refer to the registration, however formal or
- informal, of a object with a particular organisation or
- person. Both hypertext and manual references rely on long-
- lasting names.
- Current names are basically location specifiers (addresses).
- These may be known as Uniform Resource Locators (URLs). They
- give the necessary parts of an address for a reader to access
- an information provider using the given protocol, and ask for
- the object required. Examples of names used by various
- protocols include
-
- File Transfer Protocol (Postel 1985):
- Host name or IP-address
- [TCP port]
- [user name, password]
- Filename
-
- W.A.I.S. (Kahle 1990) Host name or IP-address
- [TCP port]
- database name
- local document id
-
- Gopher (Alberti 1991) Host name or IP-address
- [TCP port]
-
-
- Internet Draft 3 March 1993Uniform Resource Locators Berners-Lee
-
-
- database name
- selector string
-
- HTTP (Berners-Lee 1991) Host name or IP-address
- [TCP port]
- local object id
-
- NNTP (Kantor 1986) group Group name
-
- NNTP article Host name
- unique message identifier
-
- Prospero links (Neuman 1992) Host name or IP address
- [UDP port]
- Host specific object name
- [version]
- [identifier]*
-
- x.500 distinguished name Country
- Organisation
- Organisational unit
- Person
- Local object identifier
-
-
- Other systems with their own naming schemes include BITNET
- "LISTSERV" application, FTAM file retrieval, SQLnetTM remote
- database search, proprietary distributed file systems, etc.
- Conventional syntax for writing these addresses involve
- various forms of punctuation to separate these parts. This
- sometimes, but not always, allows the naming scheme to be
- deduced from the punctuation. For example, a name of the form
- xxx.yyy.zz.edu:/pub.aa.bb.cc often implies anonymous FTP
- access. However, there is no well-defined algorithm for
- parsing an arbitrary name, as there is no common syntax.
-
-
- Expandability
-
-
- There will necessarily be a phase during which lasting names
- will become more common, as the deployment of directory
- services increases to the point where every user has direct
- or indirect access to one. Even then, however, one can
- envisage more than one competing directory system, and cases
- in which physical names are still required. A directory
- service takes a lasting name and reduces it to a physical
- address (or set of addresses) which, though less useful for
- lasting reference, is the only way to actually retrieve the
- object.
-
-
-
-
- Internet Draft 4 March 1993Uniform Resource Locators Berners-Lee
-
-
- An addressing syntax is required which will be able to
- encompass existing physical address spaces, and be extendible
- to any future protocols. This requires that it contain an
- identifier for the protocol in use. The format of the rest of
- the address will necessarily depend to a certain extent on the
- protocol.
-
-
- Relevance
-
- The life of a name is limited by any information contained
- within it which may become prematurely invalid. It is
- therefore necessary to limit the contents of a name to the
- information required for the operations above. Other
- extraneous information about the object (its size, data
- format, authorisation details, etc.) may in general change
- with time and should not be part of the name.
- One might expect such information to be part of the "header"
- of a object, and for protocols to allow the header information
- to be retrieved independently of the objects themselves.
- Any physical address may be subject to change with time:
- hence we encourage the move to lasting names and directory
- services.
-
- Uniqueness
-
- Clearly one requires unambiguous names in the sense that one
- name should refer to only one logical object. This is the case
- with all the addressing schemes in use, whether they are
- directory systems or physical addresses. (The internet
- addresses all rely on the domain name (Mockapetris 1987) of
- the host to achieve this).
- However, given that names can be translated, many apparently
- different names may lead to the same object. Any object may
- therefore be referred to by many names. One needs to be able
- to know whether two objects, retrieved through different
- paths, are in fact the same object.
- It is suggested that each object have a unique unique unique "official"
- name. This name could be stored in the object in some
- representations, or stored in a database accessible to the
- server, for example. Any references within that object
- should be parsed in the context of the official name. In the
- presence of a directory service, the official name will
- normally be the registered name of the object. However, a name
- in any scheme will do, so long as it is completely specified.
- On systems which do not allow the name to be stored (such as
- anonymous FTP archive sites), a possible ambiguity will always
- exist as to whether two similarly named objects are in fact
- the same.
- Note that Internet newsgroup names are unique world-wide,
- and news articles carry a unique message id.
-
-
-
- Internet Draft 5 March 1993Uniform Resource Locators Berners-Lee
-
-
- In most other cases, however, there is no guarantee that
- dereferencing a URL will work, or that if it does the object
- it refers to will in fact be the object intended. URLs such
- as FTP addresses are transient in that files may be moved and
- even replaced by different files of the same name. This
- disorganisation may be limited by good server management, but
- a naming scheme which is independent also of internet host
- name is obviously preferable.
-
- Readability by people
-
- This requirement has been put forward by several people
- (Clifford Lynch, Douglas Engelbart among others), and disputed
- by others. The author's view is that it will be a while
- before technology and standardisation have reached the point
- at which names and addresses will be hidden from human beings.
- As long as they must be written on the backs of envelopes and
- "cut and pasted" between workstation windows, there is a
- strong need for names to be
- . Short
- . Composed of printable (preferably non-white)
- characters
- . To a certain extent, understadable by a human being.
-
-
- Structure of names and addresses.
-
-
- A physical address is required in order for
-
- . The user's program to contact the server
- . The server to search and index, retrieve a object,
- or look up the name;
- . The user's program to locate an individual position
- or element within a object.
-
- This suggests that a name be structured, such that the parts
- necessary for these three operations be separate and only
- used by those system elements which need those parts. This
- corresponds to the basic principle of information hiding. In
- fact, four parts are necessary, including the indicator of
- the naming scheme to be used:
-
- . The naming scheme: a registered identifier for the
- protocol.
- . The name of a suitable server. The format of this
- part must be well defined. It will depend on the
- lower-layer protocols in use. Systems which use
- widely distributed information, such as x.500 and
- NNTP, do not need this part as each client generally
- contacts his nearest server (or a particular
- server).
-
-
- Internet Draft 6 March 1993Uniform Resource Locators Berners-Lee
-
-
- . Information to be passed to the server. This may be
- private to the server, as all names may be generated
- and used by the same server. This part of the name
- should be opaque to the client.
- . Information to be used by the application once the
- object has been retrieved. This part is private to
- the application (or, more strictly, the data format)
- and so cannot be defined here.
-
- Both lasting names and physical addresses often share a
- hierarchical structure. This follows often from the
- organisation of the system. From the naming point of view, it
- has the advantage that a reference in one object to another
- object need not include that part of the structure which is
- common to both names.
-
- Choices
-
- The requirements above leave little room for choice save for
- the order and punctuation of the elements of an address. It
- is only reasonable for the order of writing of the parts to be
- consistently from left to right (or right to left) with
- increasing specificity. Punctuation schemes fall into two
- categories (Huitema 1991): tagged schemes in which field are
- given names, and fields which use special characters and field
- order. The latter tend to be more compact schemes.
-
-
-
- protocol: aftp host: xxx.yyy.edu path:
- /pub/doc/README
-
- PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;
-
- PR:aftp/xx.yy.edu/pub/doc/README
-
- /aftp/xx.yy.edu/pub/doc/README
-
-
-
- Fig 1. Some alternative tagged and untagged representations
-
- The choice of special symbols for punctuation tends to be a
- matter of taste. It is easier to read addresses whose symbols
- correspond to those of one's favourite operating system. A
- variety of symbols is needed so that when a name is
- abbreviated it is possible to tell which parts have been
- omitted. The recommendation below uses special characters in
- order to achieve a compact name, and uses where possible
- punctuation symbols established in the internet or unix
- community.
-
-
-
- Internet Draft 7 March 1993Uniform Resource Locators Berners-Lee
-
-
- The choice of escape character for introducing
- representations of non-allowed characters also tends to be a
- matter of taste. An ANSI standard exists in the C language,
- using the back-slash character "\". The use of this character
- on unix command lines, however, can be a problem as it is
- interpreted by many shell programs, and would have itself to
- be escaped.
- The use of white space characters has been avoided in URLs:
- spaces are not legal characters. This was done because of
- the frequent introduction of extraneous white space when lines
- are wrapped by systems such as mail, or sheer necessity of
- narrow column width, and because of the inter-conversion of
- various forms of white space which occurs during character
- code conversion and the transfer of text between applications.
-
-
- Recommendations
-
- This section describes the syntax for "Uniform Resource
- Locators" (URLs): that is, basically physical addresses of
- objects which are retrievable using protocols already deployed
- on the net. The generic syntax provides a framework for new
- schemes for names to be resolved using as yet undefined
- protocols.
- The syntax is described in two parts. Firstly, we give the
- syntax rules of a completely specified name; secondly, we
- give the rules under which parts of the name may be omitted in
- a well-defined context.
-
- Full form
-
- A complete URL consists of a naming scheme specifier
- followed by a string whose format is a function of the naming
- scheme. For locators of information on the internet, a common
- syntax is used for the IP address part. A BNF description of
- the URL syntax is given in an a later section. The components
- are as follows.
-
- Fragment-id
-
- This represents a part of, fragment of, or a sub-function
- within, an object or object. Its syntax and semantics are
- defined by the application responsible for the object, or the
- specification of the content type of the object. The only
- definition here is of the allowed characters by which it may
- be represented in a URL.
- The fragment-id follows the URL of the whole object from
- which it is separated by a hash sign (#). If the fragment-id
- is void, the hash sign may be omitted: A void fragment-id with
- or without the hash sign means that the URL refers to the
- whole object.
-
-
-
- Internet Draft 8 March 1993Uniform Resource Locators Berners-Lee
-
-
- While this hook is allowed for identification of fragments,
- the question of addressing of parts of objects, or of the
- grouping of objects and relationship between contined and
- containing objects, is not addressed by this object.
- This object does not address the question of objects which
- are different versions of a "living" object, nor of expressing
- the relationships between different versions and the living
- object.
-
- Scheme
-
-
- Within the URL of a object, the first element is the name of
- the scheme, separated from the rest of the object by a colon.
- The rest of the URL follows the colon in a format depending on
- the scheme.
-
-
- Internet protocol parts
-
-
- Those schemes which refer to internet protocols have a
- common syntax for the rest of the object name. This starts
- with a double slash "//" to indicate its presence, and
- continues until the following slash "/". Within that section
- are
-
- . An optional user name, if this must be quoted to the
- server, followed by a commercial at sign "@". (Use
- of this field is discouraged. Provision of encoding
- a password after the user name, delimited by a
- colon, could be made but obviously is only useful
- when the password is public, in which case it
- should not be necessary, so that is also
- discouraged.)
- . The internet domain name of the host in RFC1037
- format (or, optionally and less advisably, the IP
- address as a set of four decimal digits)
- . The port number, if it is not the default number for
- the protocol, is given in decimal notation after a
- colon.
-
-
- Path
-
- The rest of the locator is known as the "path". It may
- define details of how the client should communicate with the
- server, including information to be passed transparently to
- the server without any processing by the client.
- The path is interpreted in a manner dependent on the
- protocol being used. However, when it contains slashes, these
- must imply a hierarchical structure.
-
-
- Internet Draft 9 March 1993Uniform Resource Locators Berners-Lee
-
-
-
- Partial form
-
- In a certain limited set of cases, generally within a
- certain application, it may be useful to pass only a section
- of the URL. Within a object whose URL is well defined, the URL
- of another object may be given in abbreviated form, where
- parts of the two URLs are the same. This allows objects within
- a group to refer to each other without requiring the space for
- a complete reference, and it incidentally allows the group of
- objects to be moved without changing any references. This is
- not discussed in detail here, it is only mentioned so that the
- characters required by the technique be reserved for that
- purpose. It must be emphasised that when a reference is
- passed in anything other than a well controlled context, the
- full form must always be used.
- The partial form relies on a property of the URL syntax that
- certain characters ("/") and certain path elements ("..", ".")
- have a significance reserved for representing a hierarchical
- space, and must be recognised as such by both clients and
- servers.
- A partial form can be distinguished from a full form in that
- a full form must have a colon and that colon must occur before
- any slash characters.
- The rules for the use of a partial name are:
- . If the scheme parts are different, the whole
- absolute locator must be given. Otherwise, the
- scheme is omitted, and:
- . If the host and/or port parts are the different, the
- host, port name and all the rest of the locator must
- be given.
- . If the access and host parts are the same, then the
- path may be given in absolute (fully qualified) or
- relative form. Within the path:
- . If a leading slash is present, the path is absolute.
- Otherwise, a relative path is interpreted as
- follows:
- . The last part of the path of the context locator
- (anything following the rightmost slash) is removed,
- and the given partial URL appended in its place.
- . Within the result, all occurrences of "/xxx/.." or
- "/." are recursively removed, where xxx, ".." and
- "." are complete path elements.
-
-
- Encoding prohibited characters
-
- When a system uses a local addressing scheme, it is useful
- to provide a mapping from local addresses into URLs so that
- references to objects within the addressing scheme may be
- referred to globally, and possibly accessed through gateway
- servers.
-
-
- Internet Draft 10 March 1993Uniform Resource Locators Berners-Lee
-
-
- Any mapping scheme may be defined provided it is
- unambiguous, reversible, and provides valid URLs. It is
- recommended that where hierarchical aspects to the local
- naming scheme exist, they be mapped onto the hierarchical URL
- path syntax in order to allow the partial form to be used.
- The following encoding method is used for mapping WAIS, FTP,
- Prospero and Gopher addresses onto URLs. Where the local
- naming scheme uses ASCII characters which are not allowed in
- the URL, these may be represented in the URL by a percent
- sign "%" followed by two hexadecimal digits (0-9, A-F) giving
- the ISO Latin 1 code for that character. Character codes
- other than those allowed by the syntax shall not be used in a
- URL.
- The same encoding method may be used for encoding characters
- whose use, although technically allowed in a URL, would be
- unwise due to problems of corruption by imperfect gateways or
- misrepresentation due to the use of variant character sets, or
- which would simply be awkward in a given environment. As a %
- sign always indicates an encoded character, a URL may be made
- safer simply by encoding any characters considered unsafe,
- while leaving already encoded characters still encoded.
- (Note: If a new naming scheme is introduced which encodes
- binary data as opposed to text, then a more compact encoding
- such as pure hex or base 64 would be more appropriate.)
- The same considerations apply to mapping local fragment
- identifiers onto the fragmentid part of a URL.
-
- Specific Naming Schemes
-
- The mapping for some existing standard and experimental
- protocols is outlined in the BNF syntax definition. Notes on
- particular protocols follow.
-
- HTTP
-
- The HTTP protocol specifies that the path is handled
- transparently by those who handle URLs, except for the servers
- which dereference them. The path is passed by the client to
- the server with any request, but is not otherwise understood
- by the client. The fragmentid part is not sent with the
- request. The search part, if present, is sent.
-
- FTP
-
- The ftp: prefix indicates a file which is to be picked up
- from the file system of the given host. The FTP protocol is
- used. The port number if given gives the port of the FTP
- server if not the FTP default. (A client may in practice use
- local file access to retrieve objects which are available
- though more efficient means such as local file open or NFS
- mounting, where this is available and equivalent)
-
-
-
- Internet Draft 11 March 1993Uniform Resource Locators Berners-Lee
-
-
- The syntax allows for the inclusion of a user name and even
- a password for those systems which do not use the anonymous
- FTP convention. The default, however, if no user or password
- is supplied, will be to use that convention, viz. that the
- user name is "anonymous" and the password the user's mail
- address.
- The adoption of a unix-style syntax involves the conversion
- into non-unix local forms by either the client or server. Some
- non-unix servers do this, but clients wishing to access sites
- which do not have unix-style naming will need certain
- algorithms to enable other file systems to be identified and
- treated. Client software may also have to be flexible in
- terms of the sequence of FTP commands used with different
- varieties of server. In view of a tendency for file systems
- to look increasingly similar, it was felt that the URL
- convention should not be weighed down by extra mechanisms for
- identifying these cases.
- The data format of a file can only, in the general FTP case,
- be deduced from the name, normally the suffix of the name.
- This is not standardised. The transfer mode (binary or text)
- must in turn be deduced from the data format. It is
- recommended that conventions for suffixes of public archives
- be established, but it outside the scope of this paper.
-
- News
-
- The news locators refer to either news group names or
- article message identifiers which must conform to the rules of
- RFC 850. A message identifier may be distinguished from a
- news group name by the presence of the commercial at "@"
- character. These rules imply that within an article, a
- reference to a news group or to another article will be a
- valid URL (in the partial form).
- Note: An outstanding problem is that the message identifier
- is insufficient to allow the retrieval of an expired article,
- as no algorithm exists for deriving an archive site and
- filename. The addition of the date and news group set to the
- article's URL would allow this if a directory existed of
- archive sites by news group. Suggested subject of study in
- conjunction with NNTP WG. Further extension possible may be
- to allow the naming of subject threads as addressable objects.
-
- WAIS
-
- The current WAIS implementation public domain requires that
- a client know the "type" and length of a object prior to
- retrieval. These values are returned along with the internal
- object identifier in the search response. They have been
- encoded into the path part of the URL in order to make the URL
- sufficient for the retrieval of the object. If changes to
- WAIS specifications make the internal id something which is
-
-
-
- Internet Draft 12 March 1993Uniform Resource Locators Berners-Lee
-
-
- sufficient for later retrieval then this will not be
- necessary.
- Within the WAIS world, names do not of course not need to be
- prefixed by "wais:" (by the partial form rules).
-
- Prospero
-
- The Prospero (Neuman, 1991) directory srvice is used to
- resolve the URL yielding an access method for the object
- (which can then itself be represented as a URL if translated).
- The host part contains a host name or internet address. The
- port part is optional. The path part contains a host specific
- object name, an optional version number, and an optional list
- of attributes. If these latter feilds are present thy are
- separated from the host specific object name and from each
- other by the characters "%00" (percent, zero, zero), this
- being and escaped string terminator (null). If the optional
- list of attributes is provided, the version number must be
- present, but may be the empty string (i.e. the first attribute
- would be seperaed from the host specific name by "%00%00").
- External Prospero links are represented directly as URLs of
- the underlying access method and are not represented as
- Prospero URLs.
-
- Gopher
-
- The first character of the URL path part (after the initial
- single slash) is a single-character "type" field which is that
- used by the Gopher protocol. The rest of the path is the
- "selector string", with disallowed characters encoded. Note
- that some selector strings begin with a copy of the gopher
- type character, in which case that character will occur twice
- consecutively in the URL. If the type character and selector
- are omitted, the type defaults to "1".
- Gopher links which refer to different protocols should be
- converted into URLs for those protocols.
-
- Telnet, rlogin, tn3270
-
- The use of URLs to represent interactive sessions is a
- convenient extension to their uses for objects. This allows
- access to information systems which only provide an
- interactive service, and no information server. As
- information within the service cannot be addressed
- individually or, in general, automatically retrieved, this is
- a less desirable, though currently common, solution.
-
- x500
-
- The mapping of x500 names onto URLs is not defined here. A
- decision is required as to whether "distinguished names" or
- "user friendly names" (ufn), or both, should be allowed. If
-
-
- Internet Draft 13 March 1993Uniform Resource Locators Berners-Lee
-
-
- any punctuation conversions are needed from the adopted x500
- representation (such as the use of slashes between parts of a
- ufn) they must be defined. This is a subject for study.
-
- WHOIS
-
- This prefix describes the access using the "whois++" scheme
- in the process of definition. The hostname part is the same as
- for other IP based schemes. The path part can be either a
- whois handle for a whosi object, or it can be a valid whois
- query string. This is a subject for further study.
-
- Network Management Database
-
- This is a subject for study.
-
-
- Registration of naming schemes
-
- A new naming scheme may be introduced by defining a mapping
- onto a conforming URL syntax, using a new scheme identifier.
- Experimental scheme identifiers may be used by mutual
- agreement between parties, and must start with the characters
- "x-". The scheme name "urn:" is reserved for the work in
- progress on a scheme for more persistent names. Therefore
- URNs (Names) and URLs (Locators) be distinguishable. An
- object which is either a URL or a URN is known as a URI
- (Identifier).
- It is proposed that the Internet Assigned Numbers Authority
- (IANA) perform the function of registration of new schemes.
- Any submission of a new URL scheme must include a definition
- of an algorithm for the retrieval of any object within that
- scheme. The algorithm must take the URL and produce either a
- set of URL(s) which will lead to the desired object, or the
- object itself, in a well-defined or determinable format. It is
- recommended that those proposing a new scheme demonstrate its
- utility and operability by the provision of a gateway which
- will provide images of objects in the new scheme for clients
- using an existing protocol. If the new scheme is not a
- locator scheme, then the properties of names in the new space
- should be clearly defined.
- It is likewise recommended that, where a protocol allows for
- retrieval by URI, that the client software have provision for
- being configured to use specific gateway locators for indirect
- access through new naming schemes.
-
- BNF syntax
-
- This is a BNF-like description of the Uniform Resource
- Locator syntax. A vertical line "|" indicates alternatives,
- and [brackets] indicate optional parts. Spaces are
- representated by the word "space". Single letters stand for
-
-
- Internet Draft 14 March 1993Uniform Resource Locators Berners-Lee
-
-
- single letters. All words of more than one letter below are
- entities described somewhere in this description.
-
- fragmentaddress url [ # fragmentid ]
- url generic | httpaddress | fileaddress |
- newsaddress | prosperoaddress | telnetaddress
- | gopheraddress | waisaddress | afsaddress
- generic scheme : path [ ? search ]
- scheme ialpha
- httpaddress h t t p : / / hostport [ / path ] [ ?
- search ]
- fileaddress f t p : / / host / path
- afsaddress a f s : / / cellname / path
- newsaddress n e w s : groupart
- waisaddress waisindex | waisdoc
- waisindex w a i s : / / hostport / database [ ? search
- ]
- waisdoc w a i s : / / hostport / database / wtype /
- digits / path
- groupart * | group | article
- group ialpha [ . group ]
- article xalphas @ host
- database xalphas
- wtype xalphas
- prosperoaddress prosperolink
- prosperolink p r o s p e r o : / / hostport / hsoname [ %
- 0 0 version [ attributes ] ]
- hsoname path
- version digits
- attributes attribute [ attributes ]
- attribute alphanums
- telnetaddress t e l n e t : / / [ user @ ] hostport
- gopheraddress g o p h e r : / / hostport [/ gtype [
- selector ] ] [ ? search ]
- hostport host [ : port ]
- host hostname | hostnumber
- cellname hostname
- hostname ialpha [ . hostname ]
- hostnumber digits . digits . digits . digits
- port digits
- selector path
- path void | xpalphas [ / path ]
- search xalphas [ + search ]
- user xalphas
- fragmentid xalphas
- gtype xalpha
- xalpha alpha | digit | safe | extra | escape
- xalphas xalpha [ xalphas ]
- xpalpha xalpha | +
- xpalphas xpalpha [ xpalpha ]
- ialpha alpha [ xalphas ]
-
-
-
- Internet Draft 15 March 1993Uniform Resource Locators Berners-Lee
-
-
- alpha a | b | c | d | e | f | g | h | i | j | k | l
- | m | n | o | p | q | r | s | t | u | v | w
- | x | y | z | A | B | C | D | E | F | G | H
- | I | J | K | L | M | N | O | P | Q | R | S
- | T | U | V | W | X | Y | Z
- digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
- safe $ | - | _ | @ | . | &
- extra ! | * | " | ' | ( | ) | : | ; | , | space
- escape % hex hex
- hex digit | a | b | c | d | e | f | A | B | C | D
- | E | F
- variant { | } | | | | | [ | ] | \ | ^ | ~
- punctuation < | >
- digits digit [ digits ]
- alphanum alpha | digit
- alphanums alphanum [ alphanums ]
- void
-
-
- Security considerations
-
- The URL scheme does not in itself pose a security threat.
- Users should beware that there is no general guarantee that a
- URL which at one time points to a given object continues to do
- so, and does not even at some later time point to a different
- object due to the movement of objects on servers.
-
- Conclusion
-
- A need has been demonstrated, and a number of requirements
- have been stated for uniform resource locators (URLs). A
- scheme has been proposed which builds on existing conventions
- to define a syntax for URLs. This scheme has been in serious
- use by World-Wide Web (W3) initiative since 1991. Adoption of
- the scheme in correspondence, standards and software will ease
- the use of references to on-line information in a flexible way
- as the coming information age arrives.
-
- Acknowledgements
-
- This paper builds on the basic W3 design and much discussion
- of these issues by many people on the network. The discussion
- was particularly stimulated by articles by Clifford Lynch
- (1991), Brewster Kahle (1991) and Wengyik Yeong (1991b).
- Contributions from John Curran (NEARnet), Clifford Neuman
- (ISI) Ed Vielmetti (MSEN) and later the IETF URL BOF and URI
- working group have been incorporated into this issue of this
- paper.
- The draft url4 (Innternet draft 00) was generated from url3
- following discussion and overall approval of the URL working
- group on 29 March 1993. The paper url3 had been generated from
- udi2 in the light of discussion at the UDI BOF meeting at the
-
-
- Internet Draft 16 March 1993Uniform Resource Locators Berners-Lee
-
-
- Boston IETF in July 1992. Draft url4 was Internet Draft 00.
- Draft url5 incorporated changes suggested by Clifford Neuman,
- and draft url6 (ID 01) incorporated character group changes
- and a few other fixes defined by the IETF URI WG in submitting
- it as a proposed standard.
-
- References
-
- Alberti, R., et.al. (1991) "Notes on the Internet Gopher
- Protocol" University of Minnesota, December 1991,
- URL=<ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol
- >. See also
- URL=<gopher://gopher.micro.umn.edu/00/Information%20About%2
- 0Gopher/About%20Gopher>
- Berners-Lee, T., (1991) "Hypertext Transfer Protocol (HTTP)",
- CERN, December 1991,
- URL=<ftp://info.cern.ch./pub/www/doc/http-spec.txt>
- Davis, F, et al., (1990) "WAIS Interface Protocol: Prototype
- Functional Specification", Thinking Machines Corporation,
- April 23, 1990
- URL=<ftp://quake.think.com/pub/wais/doc/protspec.txt>
- International Standards Organization, (1991) Information and
- Documentation - Search and Retrieve Application Protocol
- Specification for open Systems Interconnection, ISO-10163
- Huitema, C., (1991) "Naming: strategies and techniques",
- Computer Networks and ISDN Systems 23 (1991) 107-110.
- Kahle, Brewster, (1991) "Document Identifiers, or
- International Standard Book Numbers for the Electronic
- Age", URL=<ftp://quake.think.com/pub/wais/doc/doc-ids.txt>
- Kantor, B., and Lapsley, P., (1986) "A proposed standard for
- the stream-based transmission of news", Internet RFC-977,
- February 1986. URL=<ftp://nnsc.nsf.net/rfc/rfc977.txt>
- Lynch, C., Coallition for Networked Information: (1991)
- "Workshop on ID and Reference Structures for Networked
- Information", November 1991. See
- URL=<wais://quake.think.com/wais-discussion-archives?lynch>
- Mockapetris, P., (1987) "Domain names + concepts and
- facilities", RFC-1034, USC-ISI, November 1987,
- URL=<ftp://nnsc.nsf.net/rfc/rfc1034.txt>
- Neuman, B. Clifford, (1992) "Prospero: A Tool for Organizing
- Internet Resources", Electronic Networking: Research,
- Applications and Policy, Vol 1 No 2, Meckler Westport CT
- USA. See also
- URL=<ftp://prospero.isi.edu/pub/prospero/oir.ps>
- Postel, J. and Reynolds, J. (1985) "File Transfer Protocol
- (FTP)", Internet RFC-959, October 1985.
- URL=<ftp://nnsc.nsf.net/rfc/rfc959.txt>
- Yeong, W., (1991a) "Towards Networked Information Retrieval",
- Technical report 91-06-25-01, June 1991, Performance
- Systems International, Inc.
- URL=<ftp://uu.psi.com/wp/nir.txt>
-
-
-
- Internet Draft 17 March 1993Uniform Resource Locators Berners-Lee
-
-
- Yeong, W., (1991b), "Representing Public Archives in the
- Directory", Internet Draft, November 1991. In
- <wais://nnsc.nsf.net/internet-drafts?yeong>. Work in
- progress.
-
- Author's address
-
-
- Tim Berners-Lee
- World-Wide Web project
- CERN, 1211 Geneva 23, Switzerland
- +41 (22)767 3755
- timbl@info.cern.ch
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Internet Draft 18 March 1993
-